Ruby for the Web

2011-10-19

Ruby is the best language in the world for interacting with the web, and I'm going to show you why.

This is a response to Python for the Web from gun.io, just because I cannot stand people use the term "best" without qualifying what aspect they refer to. And yes, "interacting with the web" is spongy enough to include all kinds of things.

In order to honor their "Most rights reserved." footer, I won't actually rewrite their post, but give a succinct counter to each point.

Interacting with Websites and APIs Using Ruby

First we'll handle two simple HTTP requests from the client side. For this we use the excellent REST Client gem, which can be installed via rubygems.

gem install rest-client
require 'rest-client'

puts RestClient.get('http://gnu.io')
require 'rest-client'

puts RestClient.get('https://YOURUSERNAME:PASSWORD@api.github.com/user')
require 'rest-client'

url = 'https://example.com/form'
data = {title: 'RoboCop', description: 'The best movie ever.'}
RestClient.post(url, data)

As before, you can use the basic auth syntax if you require basic or digest authentication.

Processing JSON in Ruby

Since JSON is in the stdlib, there is no need to install anything.

require 'json'
require 'rest-client'

c = RestClient.get('https://github.com/timeline.json')
j = JSON.parse(c)
j.each do |item|
  if repository = item['repository']
    puts repository['name']
  end
end

This also fixes a bug in the original code, as not every item in the timeline has a repository key.

Scraping the Web Using Ruby

Here I'll introduce you to Nokogiri, the binding for libxml2 and libxslt. The usage is heavily influenced by Hpricot and improves upon it in terms of speed, memory usage, accuracy, HTML correction, etc.

There is also a XML library in stdlib called REMXL, but there is not a single time I've used it without regrets.

So first of all install nokogiri, this also requires installation of libxml2 and libxslt on Linux, I have no idea about other systems, but the authors seem to have quite good documentation, so I'll leave the gritty details to them.

Here's how to do it on Arch Linux:

sudo pacman -S libxml2 libxslt
gem install nokogiri

And here's how to use it for HTML in combination with RestClient, although I'd personally use open-uri in this case for simplicity.

require 'nokogiri'
require 'rest-client'

tree = Nokogiri::HTML(RestClient.get('http://gun.io'))
tree.css('#frontsubtext').each do |element|
  puts element.text
end

Something that wasn't shown is how to use XPATH, since that's quite essential for most HTML and XML juggling, here we go:

require 'nokogiri'
require 'rest-client'

tree = Nokogiri::HTML(RestClient.get('http://gun.io'))
tree.xpath('//a').each do |element|
  puts "#{element.text} : #{element[:href]}"
end

Ruby Web Sites

And of course it's about time to plug my own project: Ramaze.

Let's make a little page equivalent of the gun.io example.

I won't go into much detail here, please check out the documentation, as it will answer any questions you have much better than I will be able to do here.

require 'ramaze'

class Home < Ramaze::Controller
  map '/'

  def index(*input)
    @output = input.join('/').upcase
    <<-'HTML'
<!DOCTYPE html>
<html>
  <head>
    <meta encoding="utf-8">
    <title>#{@output}</title>
  </head>
  <body>
    Your output is: #{@output}
  </body>
</html>
    HTML
  end
end

Ramaze.start